Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous $N$-player games in literature. However, limiting applicability, existing theoretical results assume variations of a "population generative model", which allows arbitrary modifications of the population distribution by the learning algorithm. Instead, we show that $N$ agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within $\tilde{\mathcal{O}}(\varepsilon^{-2})$ samples from a single sample trajectory without a population generative model, up to a standard $\mathcal{O}(\frac{1}{\sqrt{N}})$ error due to the mean field. Taking a divergent approach from literature, instead of working with the best-response map we first show that a policy mirror ascent map can be used to construct a contractive operator having the Nash equilibrium as its fixed point. Next, we prove that conditional TD-learning in $N$-agent games can learn value functions within $\tilde{\mathcal{O}}(\varepsilon^{-2})$ time steps. These results allow proving sample complexity guarantees in the oracle-free setting by only relying on a sample path from the $N$ agent simulator. Furthermore, we demonstrate that our methodology allows for independent learning by $N$ agents with finite sample guarantees.
translated by 谷歌翻译
降低策略梯度方法方差的梯度估计器已成为近年来增强学习研究的主要重点之一,因为它们允许加速估算过程。我们提出了一种称为Sharp的方差降低的策略梯度方法,该方法将二阶信息纳入随机梯度下降(SGD)中,并使用动量和时间变化的学习率。 Sharp Algorithm无参数,实现$ \ Epsilon $ - Appro-Appro-Approximate固定点,带有$ O(\ Epsilon^{ - 3})$的轨迹数,同时使用批量的大小为$ O(1)$迭代。与以前的大多数工作不同,我们提出的算法不需要重要的抽样,这可能会损害降低方差的优势。此外,估计错误的差异会以$ o(1/t^{2/3})$的快速速率衰减,其中$ t $是迭代的数量。我们广泛的实验评估表明,拟议算法对各种控制任务的有效性及其对实践中最新状态的优势。
translated by 谷歌翻译
梯度下降上升(GDA),最简单的单环路算法用于非凸起最小化优化,广泛用于实际应用,例如生成的对抗网络(GANS)和对抗性训练。尽管其理想的简单性,最近的工作表明了理论上的GDA的较差收敛率,即使在一侧对象的强凹面也是如此。本文为两个替代的单环算法建立了新的收敛结果 - 交替GDA和平滑GDA - 在温和的假设下,目标对一个变量的polyak-lojasiewicz(pl)条件满足Polyak-lojasiewicz(pl)条件。我们证明,找到一个$ \ epsilon $ -stationary点,(i)交替的GDA及其随机变体(没有迷你批量),分别需要$ o(\ kappa ^ {2} \ epsilon ^ { - 2})$和$ o(\ kappa ^ {4} \ epsilon ^ {-4})$迭代,而(ii)平滑gda及其随机变体(没有迷你批次)分别需要$ o(\ kappa \ epsilon ^ { - 2}) $和$ o(\ kappa ^ {2} \ epsilon ^ { - 4})$迭代。后者大大改善了Vanilla GDA,并在类似的环境下给出了单环算法之间的最佳已知复杂性结果。我们进一步展示了这些算法在训练GAN和强大的非线性回归中的经验效率。
translated by 谷歌翻译
本文开发了一种新颖的控制理论框架,用于分析Q学习的非渐近融合。我们表明,具有恒定阶梯大小的异步Q学习的动态可以自然地作为离散时间随机仿射系统。此外,Q学习估计误差的演变是由两个更简单的动态系统的轨迹过度而低估的。基于这两个系统,我们在使用常量步骤时,我们推出了异步Q-Learse的新的有限时间误差。我们的分析还阐明了Q-Learning的高估现象。我们进一步说明并通过数值模拟来验证分析。
translated by 谷歌翻译
在本文中,我们建立了双Q学习和Q学习的渐近于点误差之间的理论比较。我们的结果基于基于Lyapunov方程的线性随机近似的分析,并适用于表格设置和线性函数近似,但前提是最佳策略是唯一的,并且算法收敛。我们表明,如果双Q学习使用Q学习率的两倍,并输出了两个估计量的平均值,则双Q学习的渐近于点误差完全等于Q学习的误差。我们还使用模拟给出了这种理论观察的一些实际含义。
translated by 谷歌翻译
Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
Text clustering and topic extraction are two important tasks in text mining. Usually, these two tasks are performed separately. For topic extraction to facilitate clustering, we can first project texts into a topic space and then perform a clustering algorithm to obtain clusters. To promote topic extraction by clustering, we can first obtain clusters with a clustering algorithm and then extract cluster-specific topics. However, this naive strategy ignores the fact that text clustering and topic extraction are strongly correlated and follow a chicken-and-egg relationship. Performing them separately fails to make them mutually benefit each other to achieve the best overall performance. In this paper, we propose an unsupervised text clustering and topic extraction framework (ClusTop) which integrates text clustering and topic extraction into a unified framework and can achieve high-quality clustering result and extract topics from each cluster simultaneously. Our framework includes four components: enhanced language model training, dimensionality reduction, clustering and topic extraction, where the enhanced language model can be viewed as a bridge between clustering and topic extraction. On one hand, it provides text embeddings with a strong cluster structure which facilitates effective text clustering; on the other hand, it pays high attention on the topic related words for topic extraction because of its self-attention architecture. Moreover, the training of enhanced language model is unsupervised. Experiments on two datasets demonstrate the effectiveness of our framework and provide benchmarks for different model combinations in this framework.
translated by 谷歌翻译
This paper illustrates the technologies of user next intent prediction with a concept knowledge graph. The system has been deployed on the Web at Alipay, serving more than 100 million daily active users. Specifically, we propose AlipayKG to explicitly characterize user intent, which is an offline concept knowledge graph in the Life-Service domain modeling the historical behaviors of users, the rich content interacted by users and the relations between them. We further introduce a Transformer-based model which integrates expert rules from the knowledge graph to infer the online user's next intent. Experimental results demonstrate that the proposed system can effectively enhance the performance of the downstream tasks while retaining explainability.
translated by 谷歌翻译
Capturing feature information effectively is of great importance in vision tasks. With the development of convolutional neural networks (CNNs), concepts like residual connection and multiple scales promote continual performance gains on diverse deep learning vision tasks. However, the existing methods do not organically combined advantages of these valid ideas. In this paper, we propose a novel CNN architecture called GoogLe2Net, it consists of residual feature-reutilization inceptions (ResFRI) or split residual feature-reutilization inceptions (Split-ResFRI) which create transverse passages between adjacent groups of convolutional layers to enable features flow to latter processing branches and possess residual connections to better process information. Our GoogLe2Net is able to reutilize information captured by foregoing groups of convolutional layers and express multi-scale features at a fine-grained level, which improves performances in image classification. And the inception we proposed could be embedded into inception-like networks directly without any migration costs. Moreover, in experiments based on popular vision datasets, such as CIFAR10 (97.94%), CIFAR100 (85.91%) and Tiny Imagenet (70.54%), we obtain better results on image classification task compared with other modern models.
translated by 谷歌翻译